Implicit Acceleration of Critical Sections via Unsuccessful Speculation

نویسندگان

  • Joseph Izraelevitz
  • Alex Kogan
  • Yossi Lev
چکیده

The speculative execution of critical sections, whether done using HTM via the transactional lock elision pattern or using a software solution such as STM or a sequence lock, has the potential to improve software performance with minimal programmer effort. The technique improves performance by allowing critical sections to proceed in parallel as long as they do not conflict at run time. In this work we experimented with software speculative executions of critical sections on the STAMP benchmark suite and found that such speculative executions can improve overall performance even when they are unsuccessful — and, in fact, even when they cannot succeed. Our investigation used the Oracle Adaptive Lock Elision (ALE) library which supports the integration of multiple speculative execution methods (in hardware and in software). This software suite collects extensive performance statistics; these statistics shed light on the interaction between these speculative execution methods and their effect on performance. Inspection of these statistics revealed that unsuccessful speculative executions can accelerate the performance of the program for two reasons: they can significantly reduce the time the lock is held in the subsequent non-speculative execution of the critical section by prefetching memory needed for that execution; additionally, they affect the interleaving between threads trying to acquire the lock, thus serving as a back-off and fairness mechanism. This paper describes our investigation and demonstrates how these factors affect the behavior of multiple STAMP benchmarks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An explicit parallelism study based on thread-level speculation

Developments in parallel architectures are an important branch in computer science. The success of such architectures derives from their inherent ability to improve the program performances. However, their ability to improve the performance on programs depends on the parallelism extraction strategies, which are always limited by the logic of each sequential program. Speculation is the only know...

متن کامل

Favorable outcomes of using critical appraisal technique beside lecturing method for teaching theoretical sections

Introduction: We aimed to assess the outcomes of critical appraisal method beside lecturing versus lecturing method regarding theoretical sections among the student of msc of nutritional science. methods: This semi-experimental study was undertaken on 8 students of nutritional science in two semesters during 2009-2010. Some sections were taught via using critique of original articles, but oth...

متن کامل

An implicit finite difference scheme for analyzing the effect of body acceleration on pulsatile blood flow through a stenosed artery

With an aim to investigate the effect of externally imposed body acceleration on two dimensional,pulsatile blood flow through a stenosed artery is under consideration in this article. The blood flow has been assumed to be non-linear, incompressible and fully developed. The artery is assumed to be an elastic cylindrical tube and the geometry of the stenosis considered as time dependent, and a co...

متن کامل

Implicit Transactional Memory in Kilo-Instruction Multiprocessors

Although they have been the main server technology for many years, multiprocessors are undergoing a renaissance due to multi-core chips and the attractive scalability properties of combining a number of such multi-core chips into a system. The widespread use of multiprocessor systems will make performance losses due to consistency models and synchronization styles of popular programming models ...

متن کامل

Chip Multiprocessors with Implicit Transactions

Chip Multiprocessors (CMPs) are an efficient way of designing and use the huge amount of transistors on a chip. Different cores on a chip can compose a shared memory system with a very low-latency interconnect at a very low cost. Unfortunately, consistency models and synchronization styles of popular programming models for multiprocessors impose severe performance losses. Known architectural ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015